Optimizing K-Means by Fixing Initial Cluster Centers

نویسندگان

  • Neeti Arora
  • Mahesh Motwani
چکیده

Data mining techniques help in business decision making and predicting behaviors and future trends. Clustering is a data mining technique used to make groups of objects that are somehow similar in characteristics. Clustering analyzes data objects without consulting a known class label or category i.e. it is an unsupervised data mining technique. Kmeans is a widely used partitional clustering algorithm but the performance of K-means strongly depends on the initial guess of centers (centroid) and the final cluster centroids may not be the optimal ones. Therefore it is important for Kmeans to have good choice of initial centroids. By augmenting K-means with a technique of selecting centroids using criteria of sum of distances of data objects to all other data objects, we obtain an algorithm Farthest Distributed Centroids Clustering (FDCC) that result in better clustering as compared to not only the K-means partition clustering algorithm but also to the agglomerative hierarchical clustering algorithm and Hierarchical partitioning clustering algorithm. Unlike K-means FDCC algorithm does not perform random generation of the initial centers and does not produce different results for the same input data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Genetic K-means Clustering Algorithm Based on the Optimized Initial Centers

An optimized initial center of K-means algorithm(PKM) is proposed, which select the k furthest distance data in the high-density area as the initial cluster centers. Experiments show that the algorithm not only has a weak dependence on the initial data, but also has fast convergence and high clustering quality. To obtain effective cluster and accurate cluster, we combine the optimized K-means a...

متن کامل

Modified K-Means for Better Initial Cluster Centres

The k-means clustering algorithm is most popularly used in data mining for real world applications. The efficiency and performance of the k-means algorithm is greatly affected by initial cluster centers as different initial cluster centers often lead to different clustering. In this paper, we propose a modified k-means algorithm which has additional steps for selecting better cluster centers. W...

متن کامل

Improved Fuzzy Art Method for Initializing K-means

The K-means algorithm is quite sensitive to the cluster centers selected initially and can perform different clusterings depending on these initialization conditions. Within the scope of this study, a new method based on the Fuzzy ART algorithm which is called Improved Fuzzy ART (IFART) is used in the determination of initial cluster centers. By using IFART, better quality clusters are achieved...

متن کامل

A new algorithm for choosing initial cluster centers for k-means

The k-means algorithm is widely used in many applications due to its simplicity and fast speed. However, its result is very sensitive to the initialization step: choosing initial cluster centers. Different initialization algorithms may lead to different clustering results and may also affect the convergence of the method. In this paper, we propose a new algorithm for improving the initializatio...

متن کامل

A New Initialization Method to Originate Initial Cluster Centers for K-Means Algorithm

K means algorithm is most popular partition based algorithm that is widely used in data clustering. A Lot of algorithms have been proposed for data clustering using K-Means algorithm due to its simplicity, efficiency and ease convergence. In spite this K-Means algorithm has some drawbacks like initial cluster centers, stuck in local optima etc. In this study, a new method is proposed to address...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014